# When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning

## 📖 Overview

**ACQO (Adaptive Complex Query Optimization)** is a novel reinforcement learning framework designed to optimize complex queries in Retrieval-Augmented Generation (RAG) systems. Unlike existing approaches that focus on single query expansion, ACQO adaptively handles complex real-world queries requiring multiple parallel and sequential search strategies.

### Key Features

- **🔄 Adaptive Query Reformulation (AQR)**: Dynamically decides when to decompose queries into sub-queries
- **🔀 Rank-Score Fusion (RSF)**: Robust result aggregation with stable reward signals
- **📚 Curriculum Reinforcement Learning (CRL)**: Two-stage training strategy for improved stability
- **⚡ High Efficiency**: Improved computational efficiency with broad retrieval architecture compatibility
- **🎯 State-of-the-Art**: Superior performance on three complex query benchmarks

## 🏗️ Repository Structure

```
ACQO/
├── data/                          # Dataset storage and preprocessing
├── evaluation/                    # Evaluation scripts and metrics
├── local_index_search/           # Local search index utilities
├── recipe/                       # Training configurations and recipes
│   ├── dapo/                     # DAPO algorithm configurations
│   ├── drgrpo/                   # DRG-RPO algorithm configurations  
│   ├── prime/                    # PRIME algorithm configurations
│   ├── r1/                       # R1 algorithm configurations
│   └── sppo/                     # SPPO algorithm configurations
├── scripts/                      # Training and execution scripts
├── search_launch/                # Retrieval service launch scripts
│   ├── ance/                     # ANCE retrieval service
│   └── lucene/                   # Lucene retrieval service
├── src/                          # Core source code
├── test_data/                    # Test datasets and examples
└── verl/                         # VERL framework components
    ├── models/                   # Model implementations
    │   ├── llama/                # LLaMA model support
    │   ├── mcore/                # Megatron core integration
    │   ├── qwen2/                # Qwen2 model support
    │   └── transformers/         # Transformers integration
    ├── single_controller/        # Single controller implementations
    ├── third_party/              # Third-party integrations
    │   ├── sglang/               # SGLang integration
    │   └── vllm/                 # vLLM integration
    ├── tools/                    # Utility tools
    ├── trainer/                  # Training framework
    │   ├── config/               # Training configurations
    │   └── ppo/                  # PPO trainer implementation
    ├── utils/                    # Utility functions
    │   ├── checkpoint/           # Checkpoint management
    │   ├── dataset/              # Dataset utilities
    │   ├── debug/                # Debugging tools
    │   ├── logger/               # Logging utilities
    │   ├── megatron/             # Megatron utilities
    │   ├── metric/               # Evaluation metrics
    │   ├── rendezvous/           # Distributed training utilities
    │   └── reward_score/         # Reward scoring functions
    ├── version/                  # Version management
    └── workers/                  # Distributed worker implementations
        ├── actor/                # Actor workers
        ├── critic/               # Critic workers
        ├── reward_manager/       # Reward management
        ├── reward_model/         # Reward model workers
        ├── rollout/              # Rollout workers
        └── sharding_manager/     # Sharding management
```

## 🚀 Quick Start

### Prerequisites

- Python 3.8+
- CUDA 11.8+ (for GPU support)
- 16GB+ RAM recommended
- Multiple GPUs recommended for distributed training

### Installation

1. **Clone the repository**
```bash
git clone https://github.com/your-org/ACQO.git
cd ACQO
```

2. **Create virtual environment**
```bash
conda create -n acqo python=3.8
conda activate acqo
```

3. **Install dependencies**
```bash
pip install -r requirements.txt
```

4. **Install the package**
```bash
pip install -e .
```

### Data Preparation

1. **Download datasets**
```bash
# Download and prepare evaluation datasets
python scripts/prepare_data.py --dataset all
```

2. **Build search indices**
```bash
# Build local search indices
python scripts/build_index.py --index_type lucene --data_path data/
```

## 🎯 Usage

### Training

#### Stage 1: Curriculum Learning - Simple Queries
```bash
# Train on simple queries first
python scripts/train_stage1.py \
    --config recipe/r1/config/stage1.yaml \
    --output_dir outputs/stage1 \
    --num_gpus 4
```

#### Stage 2: Curriculum Learning - Complex Queries
```bash
# Train on complex queries
python scripts/train_stage2.py \
    --config recipe/r1/config/stage2.yaml \
    --checkpoint outputs/stage1/best_model.pt \
    --output_dir outputs/stage2 \
    --num_gpus 4
```

#### Alternative Training Recipes

**SPPO Training:**
```bash
python scripts/train_sppo.py \
    --config recipe/sppo/config/default.yaml \
    --output_dir outputs/sppo
```

**DAPO Training:**
```bash
python scripts/train_dapo.py \
    --config recipe/dapo/config/default.yaml \
    --output_dir outputs/dapo
```

### Retrieval Service Setup

#### Launch Lucene Service
```bash
cd search_launch/lucene
python launch_lucene_server.py \
    --port 8080 \
    --index_path ../../data/indices/lucene
```

#### Launch ANCE Service
```bash
cd search_launch/ance
python launch_ance_server.py \
    --port 8081 \
    --model_path ../../models/ance \
    --index_path ../../data/indices/ance
```

### Evaluation

#### Single Query Evaluation
```bash
python evaluation/evaluate_single.py \
    --model_path outputs/stage2/best_model.pt \
    --test_data test_data/single_queries.json \
    --retrieval_service http://localhost:8080
```

#### Complex Query Evaluation
```bash
python evaluation/evaluate_complex.py \
    --model_path outputs/stage2/best_model.pt \
    --test_data test_data/complex_queries.json \
    --retrieval_service http://localhost:8080 \
    --metrics ndcg,map,recall
```

#### Benchmark Evaluation
```bash
# Evaluate on all benchmarks
python evaluation/run_benchmarks.py \
    --model_path outputs/stage2/best_model.pt \
    --benchmarks msmarco,nq,hotpotqa \
    --output_dir results/
```

### Inference

#### Interactive Query Optimization
```python
from src.acqo import ACQOOptimizer

# Initialize the optimizer
optimizer = ACQOOptimizer(
    model_path="outputs/stage2/best_model.pt",
    retrieval_service="http://localhost:8080"
)

# Optimize a complex query
query = "What are the environmental impacts of renewable energy sources compared to fossil fuels in developing countries?"
optimized_queries = optimizer.optimize(query)
results = optimizer.retrieve_and_rank(optimized_queries)

print(f"Original query: {query}")
print(f"Optimized queries: {optimized_queries}")
print(f"Top results: {results[:5]}")
```

#### Batch Processing
```bash
python scripts/batch_optimize.py \
    --input_file queries.txt \
    --output_file optimized_results.json \
    --model_path outputs/stage2/best_model.pt \
    --batch_size 32
```

## 📊 Experimental Results

### Performance on Complex Query Benchmarks

| Method | MS MARCO | Natural Questions | HotpotQA | Average |
|--------|----------|-------------------|----------|---------|
| BM25 | 0.187 | 0.320 | 0.285 | 0.264 |
| DPR | 0.314 | 0.411 | 0.387 | 0.371 |
| ColBERT | 0.360 | 0.446 | 0.421 | 0.409 |
| Query2Doc | 0.378 | 0.463 | 0.445 | 0.429 |
| **ACQO (Ours)** | **0.425** | **0.521** | **0.498** | **0.481** |

### Computational Efficiency

| Method | Training Time | Inference Time | Memory Usage |
|--------|---------------|----------------|--------------|
| Baseline RL | 48h | 2.3s | 12GB |
| **ACQO** | **36h** | **1.8s** | **8GB** |

## 🔧 Configuration

### Training Configuration Example

```yaml
# recipe/r1/config/stage1.yaml
model:
  name: "llama-7b"
  max_length: 512
  
training:
  batch_size: 16
  learning_rate: 1e-5
  num_epochs: 10
  warmup_steps: 1000
  
aqr_module:
  max_subqueries: 5
  decomposition_threshold: 0.7
  
rsf_module:
  fusion_method: "weighted_sum"
  score_normalization: true
  
curriculum:
  stage1_epochs: 5
  complexity_threshold: 0.5
```

### Retrieval Service Configuration

```yaml
# search_launch/lucene/config.yaml
lucene:
  index_path: "data/indices/lucene"
  analyzer: "standard"
  similarity: "bm25"
  
server:
  host: "0.0.0.0"
  port: 8080
  workers: 4
```

## 🧪 Testing

### Unit Tests
```bash
python -m pytest tests/unit/ -v
```

### Integration Tests
```bash
python -m pytest tests/integration/ -v
```

### End-to-End Tests
```bash
python -m pytest tests/e2e/ -v
```

## 📈 Monitoring and Logging

### TensorBoard Monitoring
```bash
tensorboard --logdir outputs/logs --port 6006
```

### Weights & Biases Integration
```bash
# Set your W&B API key
export WANDB_API_KEY=your_api_key

# Training with W&B logging
python scripts/train_stage1.py \
    --config recipe/r1/config/stage1.yaml \
    --wandb_project acqo-experiments \
    --wandb_run_name stage1-experiment
```

## 🤝 Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.

### Development Setup
```bash
# Install development dependencies
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

# Run code formatting
black src/ tests/
isort src/ tests/

# Run linting
flake8 src/ tests/
```

## 📄 Citation

If you use ACQO in your research, please cite our paper:

```bibtex
@inproceedings{acqo2026,
  title={ACQO: Adaptive Complex Query Optimization for RAG Systems},
  author={Your Name and Co-authors},
  booktitle={International Conference on Learning Representations},
  year={2026}
}
```

## 📜 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Thanks to the open-source community for the foundational tools and libraries
- Special thanks to the VERL framework contributors
- Inspired by recent advances in retrieval-augmented generation and reinforcement learning

## 📞 Contact

- **Authors**: [Your Name](mailto:your.email@institution.edu)
- **Issues**: Please use the [GitHub Issues](https://github.com/your-org/ACQO/issues) page
- **Discussions**: Join our [GitHub Discussions](https://github.com/your-org/ACQO/discussions)

## 🔗 Related Work

- [VERL Framework](https://github.com/volcengine/verl)
- [Retrieval-Augmented Generation](https://arxiv.org/abs/2005.11401)
- [Dense Passage Retrieval](https://arxiv.org/abs/2004.04906)
- [ColBERT](https://arxiv.org/abs/2004.12832)

---

**Note**: This is research code. For production use, please ensure thorough testing and validation in your specific environment.